The weak evidence of lip print analysis for sexual dimorphism in forensic dentistry: a systematic literature review and meta-analysis

This study aimed to assess the prevalence of lip print patterns among males and females, and to test the diagnostic accuracy of lip pattern analysis for sexual dimorphism in forensic dentistry. A systematic literature review was performed following the PRISMA guidelines. The search was performed in six primary databases and three databases to cover part of the grey literature. Observational and diagnostic accuracy studies that investigated lip print patterns through cheiloscopy for sexual dimorphism were selected. Risk of bias was assessed with the Joanna Briggs Institute (JBI) tool. Proportion meta-analysis using random effects was fitted to pool the accuracy of cheiloscopy. The odds of correctly identifying males and females was assessed through a random effects meta-analysis. GRADE approach was used to assess certainty of evidence. The search found 3,977 records, published between 1982 and 2019. Seventy-two studies fulfilled the eligibility criteria and were included in the qualitative analysis (n = 22,965 participants), and twenty-two studies were sampled for meta-analysis. Fifty studies had low risk of bias. Suzuki and Tsuchihashi’s technique was the most prevalent among studies. The accuracy of sexual dimorphism through cheiloscopy ranged between 52.7 and 93.5%, while the pooled accuracy was 76.8% (95% CI = 65.8; 87.7). There was no difference between the accuracy to identify males or females (OR = 0.71; 95% CI = 0.26; 1.99). The large spectrum of studies on sexual dimorphism via cheiloscopy depicted accuracy percentage rates that rise uncertainty and concern. The unclear performance of the technique could lead to wrong forensic practice.

www.nature.com/scientificreports/ proper agreement during the following phases. The reviewers analyzed 20% of the studies based on the eligibility criteria. The aimed agreement rate was at least 81% (Kappa ≥ 0.81). After training, they were able to perform study selection based on title reading (reviewers were not blind for the authorship and year of publication). The next phase consisted of abstract reading and systematic selection. Studies without abstracts available were not excluded in this phase. Finally, the selected studies underwent full-text reading. Studies excluded in this phase had their reason for exclusion registered separately. During all the study selection process, a third reviewer was enrolled to solve any lack of agreement between the two reviewers. Studies in which the full text could not be retrieved were requested to the authors by e-mail. Additional support was obtained from the Brazilian Program of Bibliographic Commutation (COMUT) and from the Brazilian Institute of Information on Science and Technology (IBICT). In case of studies published in languages other than English, Portuguese and Spanish, the full text was translated.

Data extraction. Data extraction was performed by two examiners independently. A template Microsoft
Office Excel (Microsoft Ltd, Washington, USA) sheet was used to assure standardized data extraction. The following data were extracted: (I) identifying information-authorship, year and country of publication of the eligible studies; (II) sample profile-size, age interval, sex distribution and geographic region of origin; (III) cheiloscopy-related data-technique used for analysis, general and sex-related lip print patterns, and sensitivity and specificity of cheiloscopy for sexual dimorphism. Data extraction was supervised by a third reviewer and a forensic odontologist.
The corresponding authors were contacted by email (up to three times over two weeks) to obtain relevant information in case of missing or unclear data.

Risk of bias. The risk of bias and the assessment of individual methodological quality of the eligible studies
were accomplished by means of JBI Critical Appraisal tool for observational cross-sectional 19 or diagnostic test accuracy 20 studies. Following PRISMA 16 , two reviewers assessed the risk of bias. Lack of agreement between reviewers for any of the questions within the JBI tool was solved by a third examiner.
The percentage of positive answers to the questions led to the final score of the studies. Studies that scored up to 49% of positive answers were classified as "high risk of bias". Studies with positive answers between 50 and 69% were classified as "moderate risk of bias", while studies that scored positive answers above 70% were classified as "low risk of bias".

Summary measures
The outcomes were explored by means of descriptive analysis and were presented in narrative tables. The prevalence of lip print patterns was reported according to sex and compared between males and females. More specifically, this analysis was performed using a meta-analytical approach of proportions, in which combined prevalence estimates for males and females were estimated using random effects and Freeman-Tukey double transformation to stabilize the model's variances 21 . The heterogeneity between groups was estimated to assess the differences of lip print patterns between males and females. A meta-analysis was adjusted for each combination of lip print pattern, lip side (right/left) and lip position (upper lower). Studies with missing information about lip print pattern, lip side and lip position were not included in the meta-analysis. The meta-analysis was performed separately for the two predominant techniques found in the systematic literature review: Suzuki & Tsuchihasi (1970) and Renaud (1973).
The diagnostic accuracy of the cheiloscopy technique for sexual dimorphism was tested separately for males and females. The absolute number of correct match and mismatch between reference and target lips was extracted from each eligible study and a meta-analysis using random effect was adjusted. To avoid the exclusion of studies that reported zero match or mismatch, a correction of continuity of 0.5 was established in these cases. Studies that provided the number of hits and errors for males and females separately were included in a meta-analysis evaluating if the accuracy of cheiloscopy differed in distinguishing males and females. To assess that, the odds ratio for identifying males compared to females was calculated, and it evaluated if the methods was more or less accurate for sexual dimorphism among males compared to females.
For meta-analyses that included at least 10 studies, publication bias was investigated through Egger's test by a linear regression of the effect measure on the size of the study 22 . Statistical analyses were performed with Stata version 16.1 (StataCorp LLC, College Station, TX, USA) software. Significance level was set at 5%.
Certainty of evidence (GRADE approach). Certainty of evidence and strength of recommendation were assessed with the Grading of Recommendation, Assessment, Development, and Evaluation (GRADE) approach. According to this system, diagnostic accuracy studies start at a high level of certainty and can be downgraded based on risk of bias, inconsistency, indirect evidence, imprecision, and publication bias. The level of certainty among the identified evidence was characterized as high, moderate, low, or very low 23 .

Results
Study selection. The first phase of study selection resulted in 3,977 studies throughout the nine electronic databases. After removing duplicates, the remaining number of studies was 2,956. Exclusions based on title and abstract reading reduced the sample to 98 studies eligible for full-text reading. Six studies did not fulfill the inclusion criteria (Appendix 1), and full texts were not found for twenty studies, even after trying to contact the authors or libraries. Finally, a total of 72 studies were selected for qualitative analysis 1,2,4-6,10-15,24-84 . Quantitative analysis of the accuracy of cheiloscopy for sexual dimorphism included seven studies 1 www.nature.com/scientificreports/ studies 10,11,14,28,30,32,34,36,38,42,43,51,56,60,61,63,82 were considered in the analyses of the prevalence of lip print patterns ( Fig. 1).
Characteristics of eligible studies. The 24 , Iran (n = 1) 63 , Romania (n = 1) 41 , Croatia (n = 1) 65 , Saudi Arabia (n = 1) 28 and Poland (n = 1) 80 . The total sample of participants across studies was 22,965. The age interval of the of participants ranged from 1 to 83 years (Table 2). Fourteen studies did not describe the ethical aspects adopted in the study. None of the cross-sectional studies reported STROBE checklist as the guideline of choice.   (Tables 3 and 4). All the questions in JBI tool for cross-sectional studies were applicable, while three questions were not applicable in the JBI tool for diagnostic test accuracy studies.
Concerning diagnostic test accuracy studies, questions #1 and #2 were marked as 'unclear' or 'no' for all studies 1,5,25,48,49,54,80 . The first question checked whether the sample was selected consecutively or randomly. The second question was related to the methodological design of the studies; all studies recruited participants that

Synthesis of results.
Primary outcome-accuracy for sexual dimorphism. Seven studies 1,4,25,48,49,54,80 were included in the meta-analysis of the accuracy of lip prints for sexual dimorphism. Out of the seven studies, nine accuracy assessments were included in the meta-analysis-since the study by Topczyłko et al. 80 evaluated three different methods. The overall accuracy was 76.8% (95% CI = 65.8; 87.7, I 2 = 97%) (Fig. 2). Individual accuracy rates ranged from 52.7 to 93.5%. Six out of the seven studies included in accuracy meta-analysis provided the number of hits and error according to the sex of the patient and were included in a meta-analysis that assessed if the odds of distinguishing males  48 , described differences for sexual dimorphism (Fig. 3). The first showed 77% higher odds of identifying females compared to males (OR = 0.23; 95% CI = 0.27; 0.31), while the second showed sixfold higher odds of identifying males compared to females (OR = 6.00; 95% CI = 1.17; 30.72). One study 80 did not report samples divided by sex and was not included in the analysis.
Secondary outcome-prevalence of lip prints. According to the technique of Suzuki and Tsuchihashi (1970), lip print pattern type 2 was the most prevalent (> 30%), while type 5 was the rarest pattern (< 3%) ( Table 5). Sex differences based on prevalence rates were not detected. Publication bias was identified for studies analyzing lip print type 1' for the upper and lower dental arches on the right side, for lip print type 4 for the upper arch on the left and right sides, and for lip print type 4 for the lower arch on the right side. Sex differences were not observed using Renaud's (1970) technique. According to this technique, the most prevalent pattern was type C (> 12%), while type I was the least prevalent (< 1%) ( Table 6).  www.nature.com/scientificreports/ Certainty of evidence. GRADE approach showed low certainty of evidence. The limiting aspects were the lack of consistency between the estimated effects and the lack of overlap of confidence intervals-evidenced by the increased heterogeneity between the included studies (Table 7).

Discussion
Dental analysis, within forensic dentistry, figures as an alternative for human identification especially because of the resistance of human teeth to high temperature and cadaveric alterations 85 . Over time, several forensic applications were studied for the use of dental/oral evidence. Apart human identification, bite mark analysis 86 anthropological estimation of age 87 , sex 88 , stature 89 and ancestry 90 ; rugoscopy 91 and cheiloscopy 92 currently represent fields of forensic odontology. While some fields developed with strong scientific basis and broad legal acceptance (i.e. human identification), other fields remained controversial and lacked high-level evidence-based confirmation-this is the case of cheiloscopy. From the perspective of forensic practice, the alleged contribution  www.nature.com/scientificreports/ of cheiloscopy relies on the possibility of retrieving identifying information (such as sex) from a suspect from visible or latent lip prints left in a crime scene 93 . Two main controversies might arise from cheiloscopy: (I) in crime scene investigations, the existing lip print left on objects or other surfaces could enable higher evidence toward human identification through DNA extraction instead of comparative analysis of furrows; (II) studies on cheiloscopy are generally observational, cross-sectional and with questionable settings that include different techniques, underlying surfaces and registration materials (e.g. lipsticks and powdered metals). In this scenario, several questions are pertinent: Why the scientific literature is so vast of studies on cheiloscopy for sexual dimorphism? How often is cheiloscopy used by forensic dentists in practice? But especially (claimed in many studies): Is cheiloscopy really useful to distinguish male and females in forensic dentistry? To the present, there is no antemortem database of lip patterns worldwide (even in clinical dentistry). Moreover, registering the lips with photographs or other tools is rare-so, the application of cheiloscopy for human identification is limited from the beginning. Striving for sexual dimorphism could be an interesting asset to the armamentarium of forensic dentists, but again the application in practice is relative, especially because dental human identification is mainly necessary in challenging cases that involve charred bodies and skeletal remains 94 -in which lips are usually destroyed. Additionally, sexual dimorphism should be accomplished from body structures scientifically known for their anthropological reliability, namely the pelvic bones and skull 95 .
The evidence brought through the present systematic review was extracted from 72 studies that sampled 22,965 individuals. Out of the studies, 70% (n = 52) 1,4-6, [10][11][12][13][14][15]25,27,30,31,[35][36][37][38][39][40][43][44][45][46][48][49][50][53][54][55][56][57][59][60][61][62]64,[66][67][68][69][70][71][72][73][74][75]78,79,81,83,84 were from India. At first sight, the quality of studies was not bad when it comes to assessment of the risk of bias (nearly 70% had low risk of bias). These outcomes combined with the general quantification of the studies that detected sex differences based on lip pattern (67%) could lead to dangerous interpretations from readers that are not familiar with systematic reviews. A deeper look on the quantified outcomes of the most prevalent techniques (Suzuki & Tsuchihashi, 1970, n = 64, 88%; Renaud et al., 1973, n = 4, 5%), however, depicts an emerging lack of statistical significance (p > 0.05) for each lip pattern between males and females. The analysis performed per pattern clarifies the scenario as most of the studies in the field only test sexual dimorphism by comparing generalized (combined) patterns within sex groups (males vs. females). Further on, the limitations of cheiloscopy for sexual dimorphism is corroborated by GRADE assessment outcomes, which pooled seven studies (10% of selected studies) and 1,547 participants to clearly point out high heterogeneity (> 75%). The heterogeneity might be justified mainly because none of the 72 observational eligible studies reported data using scientifically established guidelines, namely STROBE. The resulting analysis via GRADE suggested low level of general quality and critical level of importance. Considering the diagnostic accuracy of cheiloscopy, mean outcomes point to 76%, which indicates that one in every four analysis of sexual dimorphism through lip patterns will have a wrong classification. Stronger outcomes would necessarily require a higher level of accuracy and a lower level of heterogeneity across studies. Summed up, the eligible studies screened and assessed in the present systematic review showed a good performance of cheiloscopy when the studies were analyzed separately; but when it comes to deeper analyses, especially observed per lip pattern within the techniques, lack of evident differences were detected between males and females. The limitation of cheiloscopy is, therefore, corroborated with the final quantitative assessment via GRADE.
To the present, the alleged contribution of cheiloscopy in forensic dentistry is merely superficial and highly relative. The quantification of the potential error within the diagnostic accuracy of cheiloscopy would be close to 25%-in other words, nearly 386 participants sampled in the quantitative part of this review would have their sex wrongly classified from a sample of 1547 individuals. Forensic dentistry itself is already a relative tool for human identification (not necessarily applicable in every single autopsy). In general, charred victims and skeletal remains consist of the main scenarios for a forensic odontologist. Authors might claim lip print applications to narrow disaster victim identification lists by sex, but in most of these cases bodies are not intact. If the case is somehow improving cheiloscopy studies in the future, authors are encouraged to design more advanced analyses of the morphology of the human lips to the point of having enough evidence to support the development of clinical databases and protocols for lip recording. From the perspective of forensic practice, this systematic review does not encourage the use of cheiloscopy as the sole tool for sexual dimorphism.

Conclusion
After revisiting 72 eligible studies with a pooled sample of 22,965 individuals, this systematic review revealed weak foundations for the use of lip print analysis for sexual dimorphism in forensic dentistry. The pooled sampled reduced within the meta-analysis showed an average rate of wrong sex classification of nearly 25%. The studies were highly heterogeneous as none of them followed proper EQUATOR guidelines for structuring methods and reporting data. GRADE analysis confirmed the low certainty of evidence suggesting that cheiloscopy is not a reliable tool in practice when it comes to sexual dimorphism.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.